NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Impact of Initialization on LoRA Finetuning Dynamics

Hayou, Soufiane; Ghosh, Nikhil; Yu, Bin (September 2024, NeurIPS 2024)

In this paper, we study the role of initialization in Low Rank Adaptation (LoRA) as originally introduced in Hu et al. [19]. Essentially, to start from the pretrained model as initialization for finetuning, one can either initialize B to zero and A to random (default initialization in PEFT package), or vice-versa. In both cases, the product BA is equal to zero at initialization, which makes finetuning starts from the pretrained model. These two initialization schemes are seemingly sim- ilar. They should in-principle yield the same performance and share the same optimal learning rate. We demonstrate that this is an incorrect intuition and that the first scheme (initializing B to zero and A to random) on average yields better performance compared to the other scheme. Our theoretical analysis shows that the reason behind this might be that the first initialization allows the use of larger learning rates (without causing output instability) compared to the second initial- ization, resulting in more efficient learning of the first scheme. We validate our results with extensive experiments on LLMs.
more » « less
Full Text Available
LoRA+: Efficient Low Rank Adaptation of Large Models

Hayou, Soufiane; Ghosh, Nikhil; Yu, Bin (July 2024, ICML)

Full Text Available
The Impact of Initialization on LoRA Finetuning Dynamics

Hayou, Soufiane; Ghosh, Nikhil; Yu, Bin (June 2024, Arxiv)

Full Text Available
A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors

Ghosh, Nikhil; Belkin, Mikhail (January 2023, arXivorg)

Full Text Available
Gradient dynamics of single-neuron autoencoders on orthogonal data

Ghosh, Nikhil; Frei, Spencer; Ha, Wooseok; Yu, Bin (January 2022, 14th Annual Workshop on Optimization for Machine Learning (NeurIPS 2022 Workshop))

Full Text Available

Search for: All records